R Markdown

Organized by aligner

Bowtie- Cheyenne

limma

bowtie_limma_MA <- knitr::include_graphics("figures/Bowtie_limma_MA_Moore.png") 

DESeq

bowtie_deseq_MA <- knitr::include_graphics("figures/bowtie2_deseq2_maplot.png") 

EdgR

bowtie_edger_MA <- knitr::include_graphics("figures/Bowtie2EdgeR_McGauley.png") 

All Bowtie

bowtie_limma_MA

bowtie_deseq_MA

bowtie_edger_MA

all_bowtie<- c(bowtie_limma_MA, bowtie_deseq_MA, bowtie_edger_MA)

all_bowtie
## [1] "figures/Bowtie_limma_MA_Moore.png" "figures/bowtie2_deseq2_maplot.png"
## [3] "figures/Bowtie2EdgeR_McGauley.png"

Based on initial viewing of the MA plots where Bowtie2 was the aligner used it looks like the number of differentially expressed genes varies with DE program (limma, DESeq, Edgr). Limma appears to have the least differentually expressed genes. DESeq and Edgr have more DE genes (about the same when comparing the two) but DESeq seems to have genes that are more differentially expressed, eg. further towards 1 and -1 on the y axis.

Kallisto - Jordan

limma

DESeq

EdgR

kallisto_limma_MA<- knitr::include_graphics("figures/Simpson.png") 
kallisto_edger_MA<- knitr::include_graphics("figures/MA_kallisto_edgeR.png")
kallisto_deseq_MA<- knitr::include_graphics("figures/kallistoDESeq_Sussman.png")
kallisto_edger_MA

kallisto_limma_MA

kallisto_deseq_MA

all_kallisto_MA<- c(kallisto_edger_MA, kallisto_limma_MA, kallisto_deseq_MA)

Kallisto Limma had one differentially expressed transcript at the p value of 0.01, whereas edgeR and deseq had more differentially expressed transcripts, that seemed comparable to each other. This would support the notion that DE methods make the difference when it comes to differentially expressed transcripts. This is essentially the same thing that the other aligners have mentioned/saw.

Salmon - Katie

limma

DESeq

EdgeR

salmon_limma_MA <- knitr::include_graphics("figures/McKinleySalmonLimmaMAPlot.png")
salmon_deseq_MA <- knitr::include_graphics("figures/salmon_deseq2_maplot.png") 
salmon_edger_MA <- knitr::include_graphics("figures/KE_Salmon_EdgeR_MAPlot.png") 

salmon_limma_MA

salmon_deseq_MA

salmon_edger_MA

all_salmon_MA <- c(salmon_limma_MA,salmon_deseq_MA,salmon_edger_MA)

Different aligners identified different numbers of differentially expressed genes, with DESeq finding the least and EdgeR finding the most, though it’s fairly comparable to Limma. DESeq only found 2 significantly up regulated genes and no significantly down regulated genes. All of the genes with this aligner were expressed at lower levels than the others (smaller y axis), though the spread was larger (x axis). Overall, Limma and EdgeR have similar results, but EdgeR identified more genes closer to the x-axis that were differentially expressed compared to Limma, resulting in a curve toward 0.

SailFish - Matt

limma

DESeq

EdgR

sailfish_edger_MA <- knitr::include_graphics("figures/JustinKoss.sailfish.edgeR.png") 
sailfish_limma_MA <- knitr::include_graphics("figures/Ritter_DEBrowser_image.png") 
sailfish_deseq_MA <- knitr::include_graphics("figures/Vogel_Sailfish_DESeq2_MAplot.png") 
sailfish_edger_MA

sailfish_limma_MA

sailfish_deseq_MA

It appears that Limma has the least differentially expressed genes out of these three pipelines. EdgeR and DESeq2 have similar amounts of differentially expressed genes in terms of the total number of differentially expressed genes, but DESeq certainly had the most. These two pipelines also seem to have a more normal distribution of differentially expressed genes. There are high and low levels of differential gene expression across different gene abundances. But, in the EdgeR pipeline, it seems that there is a correlation between gene abundance and the the level of differential expression. As the gene becomes more abundant, its level of differential expression decreases. DESeq combines both of these aspects. There is a slight correlation between differential expression and gene abundance at lower abundances, but as abundance increases, the level of differential expression becomes more normally distributed.

Organized by DE Program

bowtie kallisto salmon sailfish

limma

bowtie_limma_MA

kallisto_limma_MA

salmon_limma_MA

sailfish_limma_MA

Of the aligners used with limma, Kallisto had only one differentially expressed gene. Bowtie2 and Sailfish looked very similar though Sailfish seems to have more points closer to 0 on th y axis. Salmon had the most DE genes.

DESeq

bowtie_deseq_MA

kallisto_deseq_MA

salmon_deseq_MA

sailfish_deseq_MA

For the DESeq aligner salmon only found 2 DE genes, both underexpressed. This is interesting because with limma salmon found the most DE genes. Sailfish seems to have the most genes, but Sailfish, Kallisto, and Bowtie2 seem to all yield similar results. With Kallisto and Sailfish there were several over and under expressed genes beyonf 1 or -1 on the y axis.

EdgeR

bowtie_edger_MA

kallisto_edger_MA

salmon_edger_MA

sailfish_edger_MA

To rank the aligners from most DE genes to least when in combination with EdgeR: Bowtie2 and Salmon with the most, then Sailfish, then Kallisto. However, Sailfish showed the most highly expressed DE genes. Kallisto also has more highly expressed DE genes (closer to 1 and -1). So the two aligners that had the least DE genes, had the most differentially expressed.

Least and Most DE Genes

By aligner

Bowtie most: EdgeR? but DESeq has more genes expressed near 1 or -1 Bowtie least: Limma

Kallisto most:DESeq Kallisto least:limma

Salmon most:EdgeR Salmon least:DESeq

Sailfish most: DESeq Sailfish least: Limma

aligner <- c("Bowtie","Kallisto","Salmon","Sailfish")
mostDEaligner <- c("EdgeR","DESeq","EdgeR","DESeq")
leastDEaligner <- c("Limma","Limma","DESeq","Limma")
DE_by_aligner <-data.frame(Aligner=aligner,
                           Most_DE_Genes=mostDEaligner,
                           Least_DE_Genes=leastDEaligner)
DE_by_aligner
##    Aligner Most_DE_Genes Least_DE_Genes
## 1   Bowtie         EdgeR          Limma
## 2 Kallisto         DESeq          Limma
## 3   Salmon         EdgeR          DESeq
## 4 Sailfish         DESeq          Limma

By DE Program

Limma most: Salmon Limma least:Kallisto

DESeq most: Sailfish DESeq least:Salmon

EdgeR most:Bowtie or Salmon EdgeR least: Kallisto

program <- c("Limma","DESeq","EdgeR")
mostDEprogram <- c("Salmon","Sailfish","Bowtie/Salmon")
leastDEprogram <- c("Kallisto","Salmon","Kallisto")
DE_by_program <-data.frame(DE_Program=program,
                           Most_DE_Genes=mostDEprogram,
                           Least_DE_Genes=leastDEprogram)
DE_by_program
##   DE_Program Most_DE_Genes Least_DE_Genes
## 1      Limma        Salmon       Kallisto
## 2      DESeq      Sailfish         Salmon
## 3      EdgeR Bowtie/Salmon       Kallisto

Overall Summary:

In conclusion, various pipelines produce varying results. When looking at DE Programs it seems like Kallisto generally finds fewer differentially expressed genes. It looks like Salmon as an aligner was pretty good for 2/3 DE programs. It’s interesting that since Salmon and Kallisto are so similar that a major difference was seen in DE transcripts for Limma (Limma-Salmon: most, Limma-Kallisto: least).

I have not notice a clear trend yet…